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Abstract 



The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for 
finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From 
this point of view, NJ is "optimal" when the algorithm outputs the tree which minimizes the 
balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME 
tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps 

JRj^ to study the optimality of the neighbor-joining algorithm. In particular, we investigate 
and compare the polyhedral subdivisions for n < 8. A key requirement is the measurement 
of volumes of spherical polytopes in high dimension, which we obtain using a combination of 
Monte Carlo methods and polyhedral algorithms. We show that highly unrelated trees can 
be co-optimal in BME reconstruction, and that NJ regions are not convex. We obtain the I2 
radius for neighbor-joining for n = 5 and we conjecture that the ability of the neighbor-joining 
algorithm to recover the BME tree depends on the diameter of the BME tree. 

1 Introduction 

The popular neighbor-joining algorithm used for phylogenetic tree reconstruction [T3] has re- 
cently been "revealed" to be a greedy algorithm for finding the balanced minimum evolution 
tree associated to a dissimilarity map |7j . This means the following: 

Let D = {djj}"j=i be a dissimilarity map (this is an n x n symmetric matrix with zeroes 
on the diagonals and non-negative real entries). The balanced minimum evolution problem is to 
find the tree T that minimizes 



Here o(T) is the set of all cyclic permutations of the leaves that arise from planar embeddings 
of T. Denote by pfj the set of internal vertices in a tree T on the path between i and j. Then 
is equivalent to minimizing 



where Xfj = Yiv&p'^X'^^di'^) ~ if ^ 7^ J and = 0. This is an A^P-hard linear programming 
problem [3] whose relevance is given by the following theorem: 
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Definition 1.1. Let T be a tree with n leaves and I : E(T) IR+ an assignment of lengths to 
the edges. Then the length of T with respect to I is defined as 



e&E{T) 



Theorem 1.2. ([6]) Let T he a binary tree with edge lengths given by I : E{T) — > and 
D = {dij}"j=i a dissimilarity map. If the variance of dij is proportional to (i.e., var{dij) = 

c2'^»-' ' for some constant c) then is the minimum variance tree length estimator of T . More- 
over, the weighted least squares tree length estimate is equal to 

This resuh provides a weighted least squares rationale for the minimization of (|2]), and highlights 
the importance of understanding the balanced minimum evolution polytope: 

Definition 1.3. The balanced minimum evolution polytope is the convex hull of the vectors 

{ [^12 > ^^13) • • • 5 ^i;') • • • 1 -^n-in] '■ T is a trcc with n leaves} 

Example. There are four trees with n = 4 leaves. They are the 3 binary trees and the star- 
shaped tree. In this case the balanced minimum evolution polytope is the convex hull of the 
vectors: 



T is the tree with leaves 1,2 separated from 3,4, 
T is the tree with leaves 1,3 separated from 2,4, 
T is the tree with leaves 1,4 separated from 2,3, 
T is the star-shaped tree. 
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The balanced minimum evolution polytope in this case is a triangle in IR^. Note that the 
star-shaped tree is in the interior of the triangle. 

For any dissimilarity map, the trees which minimize ([2|) will be vertices of the balanced minimum 
evolution polytope; these are always the binary trees. In fact, for such trees A^- = 2~'^»j' (this is 

Pauplin's formula [16j). The BME polytope lies in IR^a) and has dimension (2) — n. The normal 
fan [T3^ of the BME polytope gives rise to BME cones which form a polyhedral subdivision of 

the space of dissimilarity maps IR^^ . They describe, for each tree T, those dissimilarity maps 
for which T minimizes Q. We discuss the polytope in more detail in Section 2. 

The neighbor-joining algorithm is a greedy algorithm for finding an approximate solution to 
([2]). We omit a detailed description of the algorithm here (readers can consult [7J), but we do 
mention the crucial fact that the selection criterion is linear in the dissimilarity map [2 . Thus, 
the N J algorithm will pick pairs of leaves to merge in a particular order and output a particular 
tree T if and only if the pairwise distances satisfy a system of linear inequalities, whose solution 
set forms a polyhedral cone in IR^). We call such a cone a neighbor- joining cone, or NJ cone. 
Thus the NJ algorithm will output a particular tree T if and only if the distance data lies in a 
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union of NJ cones. In Section 3 we show that the NJ cones partition IR^z), but do not form a 
fan. This has important impHcations for the behavior of the NJ algorithm. 

Our main result is a comparison of the neighbor-joining cones with the normal fan of the 
balanced minimum evolution polytope. This means that we characterize those dissimilarity maps 
for which neighbor-joining, despite being a greedy algorithm, is able to identify the balanced 
minimum evolution tree. These results are discussed in Section 4. 



2 The balanced minimum evolution polytope 

Throughout this paper, let {1, • • • ,n} be the set of taxa. Recall there are 2n — 3 edges in an 
unrooted tree with n leaves. For a fixed tree topology T, let Bt be the (2) x (2n — 3) matrix 
with rows indexed by pairs of leaves and columns indexed by edges in T defined as follows: 



BT{{a,b},e) 



1 if edge e is in the path from leaf a to leaf b, 
otherwise. 



By 



For example, for the tree in Figure |2j 

/ 1 1 \ 
10 10 10 
110 10 
10 10 11 
10 10 11 

1 1 1 

1 1 1 1 
10 111 
10 10 1 

V 1 1 / 

Given edge lengths / : E(T) IR+ we let b be the vector with components /(e) as e ranges over 
E(T). Any dissimilarity map d (encoded as a row vector) can now be written as 

d = Bxh + e. 

where e is a vector of "error" terms that are zero when d is a tree metric. 

The weighted least squares solution for the edge lengths b assuming a variance matrix V 
with diagonal entries Vij = Xjj (as defined in the introduction) and dissimilarity map d is given 
by 



b = {B'j^V-'BTy'B'TVd, 

where •* denotes matrix transpose. The length of T with respect to the least squares edge lengths 
is then 

1{T) = VT • d, 

where vj^ = V^^ B^iBtpV^^ Bt)^^! and 1 is the vector of all I's. We call the vectors v^^ the 
balanced minimum evolution vectors (or BME vectors). In the case of Figure [2| the BME vector 
is 

"1111111111" 
2' 4' 8' 8' 4' 8' 8' 4' 4' 2 



2 THE BALANCED MINIMUM EVOL UTION POLYTOPE 



4 



T^leaves 


dim(BME polytope) 


/-vector 


4 


2 


(3,3) 


5 


5 


(15, 105, 250, 210, 52) 


6 


9 


(105, 5460, ?, ?, ?, 90262) 


7 


14 


(945, 445410, ?, ?, ?, ?, ?) 








n 


Q-n 


((2n-5)!!,?,...) 



Table 1: The /-vector for small BME polytopes. 



The BME method is equivalent to minimizing the linear functional • d over all BME 
vectors for all tree topologies T. The BME polytope is the convex hull of all BME vectors in 
The following facts follow from the definition of the balanced minimum evolution tree: 

Lemma 2.1. The vertices of the BME polytope are the BME vectors of binary trees. The BME 
vector of the star phylogeny lies in the interior of the BME polytope, and all other BME vectors 
lie on the boundary of the BME polytope. 

The normal fan |13j of a BME polytope partitions the space of dissimilarity maps into 
cones, one for each tree. We call these BME cones. They completely characterize the BME 
method: T is the BME tree topology if and only if the dissimilarity map D lies in the BME 
cone of T. 

For a node a its shift vector is the dissimilarity map in which a is distance 1 from all other 
leaves, and all other distances are 0. According to [16], for a tree T, {yT)ab gives the probability 
that a will immediately precede 6 in a random circular ordering of T. Thus the dot-product of a 
BME vector with a shift vector must necessarily equal 1, and in fact the lineality space of BME 
cones is spanned by shift vectors. So when we describe a BME cone we will always describe just 
the pointed component, i.e. modulo the lineality space of shift vectors. 

As part of our computational study, we computed the BME polytope and BME cones for 
trees with n = 4,5,6,7,8 leaves using the software polymake [8j. In Table 1 we display some 
of the components of /-vectors we were able to compute. This provides information about the 
polytopes: the ith component of an /-vector of a polytope is the number of faces of dimension 
i — 1. For example, the first component in each vector in Table 1 is the number of 0-dimensional 
faces (vertices) of the corresponding BME polytope, i.e., the number of binary trees. 




Figure 1: The non-edges on the BME polytope for n = 7. Two trees will form a non-edge if 
and only if they are 3-cherry trees that differ by the pair of leaf exchanges shown in the figure. 
There are two ways to perform each leaf-exchange, so each binary tree with three cherries is not 
adjacent to 4 trees. 



3 NEIGHBOR-JOINING CONES 



5 



We found that the edge graph of the BME polytope is the complete graph for n = 4,5,6 
which means that for every pair of trees Ti and T2 with the same number (< 6) of leaves, there 
is a dissimilarity map for which Ti and T2 are (the only) co-optimal BME trees. However, for 
n = 7, the BME polytope does in fact have one combinatorial type of non-edge. Namely, two 
bifurcating trees with seven leaves and three cherries (two leaves adjacent to the same node in 
the tree) will form a non-edge if and only if they are related by two leaf exchanges as depicted 
in Figure [T| This completely characterizes the non-edges for n = 7. It is an interesting open 
problem to characterize the non-edges of the BME polytope in general. 

3 Neighbor-joining cones 

The neighbor-joining algorithm takes as input a dissimilarity map and outputs a tree. The tree 
is constructed "one cherry at a time" . This means that at each step leaves o, b are picked to be 
a cherry by minimizing the Q-criterion. The Q-criterion is given by the formula 

n n 

Qab ■= {n - 2)dab - ^^dak - ^Yl 

k=l k=l 

The nodes o, h are replaced by a single node z, and new distances d^k are obtained by a 
straightforward linear combination of the original pairwise distances: dzk '■= \{dak + d^k — dab)- 
Then the NJ method is applied recursively. 

We note that since new distances dzk are always linear combinations of the previous distances, 
all Q-criteria computed throughout the NJ algorithm are linear combinations of the original 
pairwise distances. Thus, for a fixed n, for every possible ordering a of picked cherries that 
results in one of the trees T with n leaves there is a polyhedral cone of dissimilarity 

maps. The set of all neighbor-joining cones is denoted by C„. Their union Uc6C„ ^ 

of Ir(2), and the intersection of any two cones is a subset — but not necessarily a face — of the 
boundary of each of the cones. Given an input from the interior of C^, the NJ algorithm will pick 
the cherries in the order a and output the corresponding tree. For inputs d on the boundary of 
one (and therefore at least two) of the cones, the order in which NJ picks cherries is undefined, 
because at some point there will be two cherries both of which have minimal Q-criterion. We 
call the cones Ca neighbor- joining cones, or NJ cones. 

Example. There is only one unlabeled binary tree with 5 leaves and there are 15 distinct labeled 
trees. For each labeled tree, there are two ways in which a cherry might be picked by the NJ 
algorithm in the first step. For instance, neighbor-joining applied to any dissimilarity map in 
Ci2,45 or C45^i2 will produce the tree in Figure |2j There are a total of 30 NJ cones for n = 5. 

We note that all Q-criteria for shift vectors equal —2, so adding any linear combination of 
shift vectors to a dissimilarity map does not change the relative values of the Q-criteria. Also, 
after picking a cherry, the reduced distance matrix of a shift vector is again a shift vector. Thus, 
for any input vector d, the behavior of the NJ algorithm on d will be the same as on d -|- s if s 
is any linear combination of shift vectors. In fact it can be shown that the lineality space of NJ 
cones is spanned by shift vectors, just as for BME cones. So from now on, when we refer to NJ 
cones, we will mean the pointed portion of the cone, i.e. modulo the lineality space. 

Theorem 3.1. The cones in Cn are not the normal fan of any polytope for n > 5. 
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Figure 2: A tree with five leaves. 



Type 


rays 


Cones 




(-3,5,- 


3,-1,5,-3,-1,1,1, 


-1) 


^23,455 ^23,15? ^-'23,145 ^12,34? *-^34,12 




(-3,5,- 


-3,-1,1,1,-1,5,-3, 


-1) 


<-^23,45, <--23,15, L'23,14, L-12,35' <-^35,12 


I 


(5,-3,- 


-3,-1,-3,5,-1,1,1, 


-1) 


<-^23,45, <-^23,15, <-^23,14, <-^24,13' '^13,24 




(1,1,-3 


,-1,-3,5,-1,5,-3, 


-1) 


<-^23,45, <-^23,15, (-^23,14, ^^25,13' ^25,13 




^5 — 3 - 


3,-1,1,1,-1,-3,5, 


-1) 


<-^23,45, <-^23,15, L-23,14, L'24,35' ^35,24 




(1,1,-3 


, —1,5, —3, —1, — 3, 5, 


-1) 


'-^23,45, '-^23,15, '-'23,14, >-'25,34, ^25,34 




/-II 

(-1,1, 


-1,1,1,-1,-1,1,1, 


-1) 


Cl2,45, 6*12,34 , 6*23,45, C'23,15, 6*34,15, 
6*34,12, 6*45,23, 6*45,12, 6*15,34, 6*15,23 


II 


(-1,1, 


-1,-1,1,1,1,1,-1, 


-1) 


612,45, 6*12,35, 623,45, 623,14, 635,14, 
6*35,12 , 6*45,23, 6*45,12, 6*14,35, 614,23 


(1,1,- 


1,-1,-1,1,-1,1,-1,1) 


625,14, 6*25,13, 623,14, 623,45, 613,45, 
6*13,25, 614,23, 614,25, 645,13, 6*45,23 




(1,-1,- 


-1,1,-1,1,-1,1,1,- 


-1) 


624,15, 6*24,13, 623,15, 623,45, 613,45, 
6*13,24, 615,23, 615,24, 645,13, 645,23 




(1,-1,- 


-1,1,1,-1,-1,-1,3, 


-1) 


6*23,45, 623,15, 612,45, 612,35, 624,15, 6*24,35, 
6*35,24, 6*35,12, 6*15,24, 6*15,23, 6*45,12, 6*45,23 


III 


(1,-1,- 
(1,-1,- 


1,-1,-1,3,1,1,-1, 
1,1,1,-1,-1,-1,3, 


-1) 
-1) 


6*23,45 , 6*23,14, 6*12,45, 6*12,34, 6*25,14, 6*25,34, 
6*34,25, 6*34,12, 6*14,25, 6*14,23, 6*45,12, 6*45,23 

6*23,45, 623,15, 613,45, 6*13,25, 634,15, 6*34,25, 
625,34, 625,13, 6*15,34 , 6*15,23 , 6*45,13, 645,23 




(1,-1,- 


-1,-1,-1,3,1,1,-1, 


-1) 


623,45, 623,14, 613,45, 613,24, 635,14, 635,24, 
624,35, 624,13, 6*14,35, 6*14,23, 6*45,13, 6*45,23 



Table 2: The 14 rays of the cone 623,45. Each ray is determined by a vector shown in the second 
column. The third column shows, for each ray, which cones it belongs to. If a cone is starred 
then the ray is inside the cone, but not a ray of it. 



To prove this theorem it is necessary to understand the geometry of the NJ cones. We describe 
the case n = 5 in detail; it also suffices to prove the theorem. 

We begin by noting that all of the NJ cones are equivalent under the action of the symmetric 
group on five elements (6*5), where an element of ^5 permutes the five taxa or, equivalently, the 
rows and columns of the input distance matrix. Each NJ cone is defined by ( (2) — 1) + ( (2) — 1) = 
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14 inequalities that are implied by the Q-criteria as the NJ algorithm picks the two cherries. 
The cones are 5-dimensional, and their intersection with a suitable hyperplane leaves a four 
dimensional polytope P. The /-vector of P is (14, 32, 27, 9). 

The 30 cones share many of their rays, giving a total of 82 rays which decompose into three 
orbits under the action of 55. We refer to the types of rays as Type I, Type II and Type III. 
Each cone has 6 rays of type I, 4 rays of type II and 4 rays of type III. Each ray of type I is the 
common ray of 3 cones, and belongs to 2 other cones of which it is not a ray (i.e. it is in the 
interior of a face). Note that this implies that the cones cannot form a fan. The type II rays 
are contained in 10 cones each, and the type III rays in 12. Type II and III rays are rays of all 
cones which contain them. For the cone 023,45, this information is tabulated in Table 2. 

We note that the rays of NJ cones are minimal intersections of NJ cones, and thus give 
dissimilarity maps for which the NJ algorithm is least stable. 

Example. Consider two alignments of 5 sequences that are to be used to construct a tree. 
These may consist of two different genes and for each of them the homologs among 5 genomes. 
Suppose that distances are estimated using the Jukes-Cantor correction O [13] separately for 
each set of sequences. That is, for the first set of sequences 



log(l-;;/r 



where fij is the fraction of different nucleotides between sequences i and j in the first set and 
for the second set 

3 4 
{D2)ij = -^log(l - -gij) 

where g^j is the fraction of different nucleotides between sequences i and j in the second set. 
If the fractions fij and g^j are given by 



/ 
0.054187 
0.151108 
0.368136 
V 0.054198 



0.054187 


0.151117 
0.054198 
0.36813 



0.151108 
0.151117 


0.054187 
0.054198 



0.368136 
0.054198 
0.054187 


0.151108 



0.054198 \ 
0.36813 
0.054198 
0.151108 




and 



/ 
0.151068 
0.05414 
0.368161 
V 0.104517 



then we obtain 



D2 





0.056244 
0.168744 
0.506257 
V 0.056256 

/ 
0.168694 
0.056194 
0.506306 
V 0.112556 



0.151068 


0.054245 
0.054245 
0.395699 

0.056244 


0.168755 
0.056256 
0.506245 

0.168694 


0.056307 
0.056307 
0.562445 



0.05414 
0.054245 


0.151068 
0.194428 

0.168744 
0.168755 


0.056244 
0.056256 

0.056194 
0.056307 


0.168694 
0.225056 



0.368161 
0.054245 
0.151068 


0.104421 

0.506257 
0.056256 
0.056244 


0.168744 

0.506306 
0.056307 
0.168694 


0.112444 



0.104517 \ 
0.395699 
0.194428 
0.104421 

/ 

0.056256 \ 
0.506245 
0.056256 
0.168744 

/ 

0.112556 \ 
0.562445 
0.225056 
0.112444 
/ 



and 
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Notice that the vector representation of Di lies in the cone Ci2,45 and the vector representation 
of D2 lies in the cone (745^12. Thus NJ returns the same tree topology for both Di and D2. 

If we concatenate the ahgnments and combine the data to build one tree, then we estimate 
the distances using the average of / and g: 



( 0.102628 0.102624 0.368148 0.079357 \ 

0.102628 0.102681 0.054222 0.381915 

0.102624 0.102681 0.102628 0.124313 

0.368148 0.054222 0.102628 0.127765 



V 0.079357 0.381915 0.124313 0.127765 







Using this frequency matrix we obtain the distance matrix Z?3 via the Jukes-Cantor correction: 



/ 
0.110364 
0.110359 
0.506281 
\ 0.083878 



0.110364 


0.110425 
0.056281 
0.533818 



0.110359 
0.110425 


0.110364 
0.135917 



0.506281 
0.056281 
0.110364 


0.140066 



0.083878 \ 
0.533818 
0.135917 
0.140066 
/ 



However, the vector representation of Dj, lies in the cone C24,i5, which means that neighbor- 
joining returns a different tree topology for D^- This example provides a distance-based recon- 
struction analog to the recent mixture model results of [10] . 



An analysis of the rays of Cn suffices to prove Theorem |3.1.| but the facet structure of each 
cone is also informative, and we were able to obtain complete information for n = 5. The types 
of facets constituting each cone are shown in Figure [3| Each cone consists of one Type A facet, 
two Type B facets, two Type C facets and four Type D facets. These facets intersect as follows: 
Type A facets are shared by pairs of cones of the form Cab,cd, Ccd,ab- Type B facets are shared 
by pairs of cones of the form Cab,de, Cab,ce] there are two such pairs for each cone. Two of the 
square facets of a Type A facet belong to Type B facets, and a pair of Type B facets share a 
hexagon consisting of six Type I rays. The remaining two square facets of a Type A facet form 
Type C facets with two Type I rays. The four triangular facets of a Type A facet form Type D 
facets (Egyptian pyramids) with two Type I rays. 

We used our description of the NJ cones to examine the I2 distance between tree metrics 
and the boundaries of NJ cones. Without loss of generality, by shifting the leaves that in the 
cherries, we can assume the tree metric is of the form 



Dt 



( 


a 

a + 13 
\ a + (5 






a 

a + (3 
a + (3 



a 
a 


/3 
/3 



a + P 
a + (3 

P 





a + P \ 

a + /3 



/ 



where a and P are the internal branch lengths, a > ^ and a + (3 = 1. It is easy to see that 
Dt S Ci2,45 confirming the consistency of neighbor-joining. The cone 6*12,45 contains 9 faces, 
but we may ignore one of them (6*45,12) as it corresponds to the same tree. The distance to the 
closest of the remaining eight faces is 



d{DT, (Ci2,45 U 645,12)'=) 



1 



a 



We summarize this as follows: 
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Figure 3: The four types of facets of P. 
Theorem 3.2. The I2 radius of neighbor- joining for 5 taxa is ^ ^ 0.5773. 

V 3 

This is shghtly larger than the /qo radius of ^ given by Atteson's theorem [1]. It is an interesting 
problem to compute the I2 radius for neighbor-joining with more taxa. 

The description of the NJ cones we have provided can also be used in practice to evaluate 
the robustness of the algorithm when used with a specific dataset. For n = 5, we examined data 
simulated from subtrees of the two tree models Ti and T2 in [12 with the Jukes-Cantor model 
and the Kimura 2-parameter models [13]. For each of 40,000 simulations, we calculated the 
^2-distance between the NJ cone of the given tree and the maximum likelihood estimates for the 
pairwise distances (see supplementary material). These show that in many cases the maximum 
likelihood estimates lie very close to the boundary. In such cases, one must conclude that the 
NJ tree is possibly incorrect due to the variance in the distance estimates. 

4 Optimality of the neighbor-joining algorithm 

In order to study the optimality of the neighbor-joining algorithm, we compared the BME 
cones with the NJ cones. Such a comparison involves intersecting the cones with the ((2) — 1)- 
sphere (in the first orthant) and then studying the volumes of their intersection by computing 
the standard Euclidean volume of the resulting surfaces. These surfaces are an intersection of 
closed hemispheres, i.e. spherical polytopes. Computing Euclidean volumes of (non-spherical) 
polytopes is a standard problem that is usually solved by triangulating and summing the volumes 
of the simplices. However there has been no publicly available software developed for computing 
or approximating volumes of spherical polytopes of dimension > 3 using this method. One 
possible reason for this is that in higher dimensions the volumes of spherical simplices are given 
by complicated analytical formulas [15] whose computational complexities are unknown. 
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#taxa 


tree shape 


#trees 


NJ vol 


BME vol 


NJ accuracy 


4 


unique 


3 


100% 


100% 


100% 


5 


unique 


15 


100% 


100% 


98.06% 


6 


3-cherry 


15 


18.50% 


18.57% 


90.39% 


6 


caterpillar 


90 


81.54% 


81.45% 


91.33% 


7 


3-cherry 


315 


45.32% 


44.58% 


82.42% 


7 


caterpillar 


630 


54.68% 


55.42% 


78.85% 


8 


4-cherry 


315 


6.48% 


6.36% 


70.12% 


8 


3-cherry 


2520 


27.12% 


25.84% 


69.93% 




(two are neighbors) 










8 


3-cherry 


2520 


35.67% 


34.54% 


71.63% 




(none are neighbors) 










8 


caterpillar 


5040 


30.73% 


33.24% 


61.75% 



Table 3: Comparison of NJ and BME cones. 

We implemented two approaches in MATLAB (using polymake as a preprocessing step) for 
approximating the volume of a spherical polytope P. One approach is trivial: it simply samples 
uniformly from the sphere, and counts how many points are inside P. This approach is partic- 
ularly suitable if P has large volume, or if many spherical polytopes are being simultaneously 
measured which partition the sphere, as is the case for NJ and BME cones. The second approach 
is suitable for spherical polytopes having small volume. We used this approach for computing 
the volumes of consistency cones [11^ which we discuss briefly in the Discussion section. 

The second approach begins by computing a triangulation of the vertices of P with additional 
interior points of P added. This triangulation defines a simplicial mesh M which is obtained 
by replacing each spherical simplex with the corresponding Euclidean simplex having the same 
vertices. The volume of M (i.e. the sum of the volumes of the simplices in the mesh) is already 
an approximation to the volume of P. We refine this estimate by Monte Carlo estimation of the 
average value of the Jacobian from M to P. This requires sampling uniformly from M, which 
is straightforward and can be done very quickly in 0(m + kd + klogk) time, where m is the 
number of simplices in the mesh, k is the number of samples, and d is the dimension. 

Our main results on the the optimality of NJ for n = 5, 6, 7, 8 taxa are summarized in Table 
3. Each row of the table describes one type of tree. Trees are classified by their topology. A 
/c-cherry tree is a tree with k cherries. The NJ volume column shows the volume of that part of 
the positive orthant of dissimilarity maps for which the NJ tree is of the specified type. Similarly, 
the BME volume column shows the same statistic for BME trees. Finally, NJ accuracy shows 
the fraction of the BME cone that overlaps the NJ cone. In other words, NJ accuracy is a 
measure of how frequently NJ will find the BME tree for a dissimilarity map that is chosen at 
random. 

We also classified and measured the intersections of NJ and BME cones in which the NJ tree 
differs from the BME tree. Many of these intersection cones are equivalent under the action of 
Sn on the leaf labels, particularly as the stabilizer of the BME tree permutes the leaf labels in 
the NJ tree. In fact, for n = 5 taxa there are only three types of mistakes that the NJ algorithm 
can make when it fails to reproduce the BME tree. These are depicted in Figure |4] and the 
normalized spherical volumes of corresponding NJ/BME intersection cones are given. 

Figure 4 can be interpreted as follows: For a random dissimilarity map, if the NJ algorithm 
does not produce the BME tree, then with probability 0.67 it produces the tree on the right. 
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Figure 4: Frequencies of the three possible types of NJ trees that may picked instead of the 
BME tree for n = 5 leaves. Neighbor-joining agrees with the BME tree 98.06% of the time. 

and if not then it almost always produces the tree in the middle. This tree differs from the BME 
tree significantly. A surprising result is that the tree on the left is almost never the NJ tree. We 
believe that a deeper understanding of the "mistakes" NJ makes when it does not optimize the 
balanced minimum evolution criterion may be important in interpreting the results, especially 
for large trees. 

We also computed analogous results for n = 6, 7, 8, 9, 10. They are available, together with 
the software for computing volumes, at the supplementary materials website 

http : //bio . math . berkeley . edu/N JBME 



5 Discussion 

Theoretical studies of the neighbor-joining algorithm have focused on statistical consistency 
and the robustness of the algorithm to small perturbations of tree metrics. The paper by [17] 
established the consistency of NJ, that is, if Dt is a tree metric then NJ outputs the tree T. 
This result was then extended in |1] and more recently [11] who show that if D is "close" to a 
tree metric Dt for some T, then NJ outputs T on input D. 

Our results provide a different perspective on the NJ algorithm. Namely, we address the 
question of the accuracy of the greedy approach for the underlying linear programming problem 
of BME optimization. This led us to the study of BME polytopes, and the combinatorics of 
these polytopes is interesting in its own right: 

Question 5.1. Is there a combinatorial criterion for two tree topologies forming an edge in the 
BME polytope, similar to pruning/re-grafting or some other operation on trees? If so, this could 
be used to define a combinatorial pivoting rule on tree space that could be used in hill-climbing 
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algorithms for phylogenetic reconstruction. Such a pivoting rule would have the advantage that 
it would be equivalent to performing an edge-walk on the BME polytope. Edge-walking methods 
are known to perform well in practice for solving linear programs. See for an example of a 
local search approach to finding minimum evolution trees. 

Similarly, a better understanding of the combinatorics of the NJ cones will lead to a clearer 
view of the strengths and weaknesses of the neighbor-joining algorithm. A basic problem is the 
following: 

Question 5.2. Find a combinatorial description of the NJ cones for general n. How many 
facets/rays are there? 

Our computational results lend new insights into the performance of the NJ and BME 
algorithms for small trees. We have measured the relative sizes of cones for different shapes of 
trees, and measured the frequencies of the all combinatorial types of discrepancies between BME 
and NJ trees. In particular, we have observed that the NJ algorithm is least likely to reproduce 
the BME tree when the BME tree is the caterpillar tree. 

Conjecture 5.3. The caterpillar tree yields the smallest ratio of spherical cone volumes vol(NJ 
n BME) / vol(BME) where NJ is the spherical cone volume of a union of the NJ cones and BME 
is the spherical cone volume of the BME cone for a fixed tree. In other words, the caterpillar 
tree is the most difficult BME tree topology for the NJ algorithm to reproduce. 

Another problem we believe is very important is to extend the results shown in Figure 4 
to large trees. In other words, to understand how neighbor-joining can fail when it does not 
succeed in finding the balanced minimum evolution tree. 

Question 5.4. What tree topologies is neighbor-joining likely to pick when it fails to construct 
the balanced minimum evolution tree? 

There are many other interesting cones related to distance-based methods that can be con- 
sidered in this context. For example, in [TT], it is shown that the quartet consistency condition 
is sufficient for neighbor-joining to reconstruct a tree from a dissimilarity map for n <7 leaves. 
The quartet consistency conditions define polyhedral cones (consistency cones) in (see 
for details). For n = 4 taxa the consistency cones cover ah of Ir(2) showing that quartet 
consistency explains the behavior of neighbor-joining for all dissimilarity maps. Using the sec- 
ond method outlined in Section 4 we succeeded in computing the volumes of the consistency 
cones intersected with the first orthant of the sphere for n = 5 taxa. There are 15 cones, all 

(5) 

equivalent under orthogonal transformation, and their union covers 27.93% of IR_j_^ , measured 
with respect to spherical volume. In other words, quartet consistency explains the behavior of 
neighbor-joining on almost ^ of dissimilarity maps. 

Such computations are pushing the boundary of computational polyhedral geometry. For 
n > 6 taxa, triangulating a consistency cone is too unwieldy, although we are confident that 
spherical volumes could still be computed using polynomial time hit-and-run sampling methods 
for volume approximation [4^- Such methods are complicated and not yet implemented. 

Finally, we comment on the example in Section 3 that shows how to different alignments 
may lead to the same neighbor-joining tree, whereas the neighbor-joining tree constructed from 
a concatenation of the alignments is different. This result has significant implications for studies 
where species trees are constructed from multiple gene families by combining the data. 
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